Nonconvergence to saddle boundary points under perturbed reinforcement learning
نویسندگان
چکیده
For several classes of reinforcement learning schemes, convergence to action profiles that are not Nash equilibria may occur with positive probability under certain conditions on the payoff function. In this paper, we explore how an alternative reinforcement learning scheme, where the strategy of each agent is also perturbed by a strategy-dependent perturbation (or mutations) function, may exclude convergence to non-Nash pure strategy profiles. This approach extends prior analysis on reinforcement learning in games that addresses the issue of convergence to saddle boundary points. It further provides a framework under which the effect of mutations can be analyzed in the context of reinforcement learning. JEL classifications: C72, C73, D83
منابع مشابه
How to Escape Saddle Points Efficiently
This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension (i.e., it is almost “dimension-free”). The convergence rate of this procedure matches the well-known convergence rate of gradient descent to first-order stationary points, up to log factors. When all saddle points are...
متن کاملMultiple solutions for a perturbed Navier boundary value problem involving the $p$-biharmonic
The aim of this article is to establish the existence of at least three solutions for a perturbed $p$-biharmonic equation depending on two real parameters. The approach is based on variational methods.
متن کاملAttainability of boundary points under reinforcement learning
This paper investigates the properties of the most common form of reinforcement learning (the “basic model” of Erev and Roth, American Economic Review, 88, 848-881, 1998). Stochastic approximation theory has been used to analyse the local stability of fixed points under this learning process. However, as we show, when such points are on the boundary of the state space, for example, pure strateg...
متن کاملConvergence Analysis of a Randomly Perturbed Infomax Algorithm for Blind Source Separation
We present a novel variation of the well-known infomax algorithm of blind source separation. Under natural gradient descent, the infomax algorithm converges to a stationary point of a limiting ordinary differential equation. However, due to the presence of saddle points or local minima of the corresponding likelihood function, the algorithm may be trapped around these “bad” stationary points fo...
متن کاملA method based on the meshless approach for singularly perturbed differential-difference equations with Boundary layers
In this paper, an effective procedure based on coordinate stretching and radial basis functions (RBFs) collocation method is applied to solve singularly perturbed differential-difference equations with layer behavior. It is well known that if the boundary layer is very small, for good resolution of the numerical solution at least one of the collocation points must lie in the boundary layer. In ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Int. J. Game Theory
دوره 44 شماره
صفحات -
تاریخ انتشار 2015